Skip to content

refactor(BA-5861): Share a single DockerStatsStreamer across CPU/Memory plugins#11234

Open
rapsealk wants to merge 5 commits into
perf/11219-stream-container-statsfrom
perf/11232-shared-statsstreamer
Open

refactor(BA-5861): Share a single DockerStatsStreamer across CPU/Memory plugins#11234
rapsealk wants to merge 5 commits into
perf/11219-stream-container-statsfrom
perf/11232-shared-statsstreamer

Conversation

@rapsealk
Copy link
Copy Markdown
Member

@rapsealk rapsealk commented Apr 22, 2026

Closes #11232 (BA-5861)
Refs #11216

Summary

  • Consolidate DockerStatsStreamer ownership onto DockerAgent — one streamer per agent instead of one per intrinsic plugin.
  • Each container now opens a single persistent Docker stats stream instead of two.
  • Agent dispatches start / stop directly via new AbstractAgent._on_container_started / _on_container_destroyed hooks; the old plugin-side notify_container_started / notify_container_destroyed interface is removed.
  • Plugins receive the shared streamer via an attach_stats_streamer() setter called before await super().__ainit__() — the streamer is live before scan_running_kernels() can inject START events into the lifecycle queue on warm restart.

Why

At ~50+ containers, the per-plugin-streamer layout hit aiohttp's default limit_per_host=30 connector limit and stacked backoff retries. See #11232 for the full rationale.

Stacked on

#11224 (perf/11219-stream-container-stats). Rebase to main once #11224 merges.

Test plan

  • pants test tests/unit/agent:: tests/component/agent/docker:: passes — includes a regression test asserting _stats_streamer is attached before scan_running_kernels runs (warm-restart ordering guard).
  • Warm-restart the agent with running containers; confirm streams reattach without AttributeError.
  • Start ~10 kernels, verify only N streams to dockerd (one per container, not two) via ss -tnp | grep dockerd | wc -l.
  • docker restart mid-session, verify streams re-establish and stats resume.

… plugins

Consolidate stream ownership onto DockerAgent so each container opens one
persistent Docker stats stream instead of two (one per intrinsic plugin).

- DockerAgent creates and owns a single DockerStatsStreamer in __ainit__,
  closes it in shutdown().
- Agent dispatches start/stop directly from container lifecycle events.
- CPUPlugin / MemoryPlugin receive the shared streamer via attach_stats_streamer;
  per-plugin instantiation is removed.
- Drop AbstractComputePlugin.notify_container_started/destroyed and the
  agent-side dispatcher - ownership now lives on the agent.

Closes #11232
Refs #11216
Refs #11224

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rapsealk rapsealk added this to the 26.5 milestone Apr 22, 2026
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size:L 100~500 LoC comp:agent Related to Agent component labels Apr 22, 2026
The consolidated DockerStatsStreamer is created in DockerAgent.__ainit__,
but was placed after `await super().__ainit__()`. AbstractAgent's init
calls scan_running_kernels() and starts the lifecycle handler, which on
warm restart can fire container-start events before the streamer exists,
raising AttributeError.

- Move streamer creation + plugin attach loop before super().__ainit__().
- Drop the dead `is not None` guard in shutdown() (annotation is
  non-Optional; assignment happens synchronously before any await).
- Switch the attach loop to hasattr(..., "attach_stats_streamer") so
  plugin subclasses are handled.
- Add a test that asserts the streamer is attached before
  scan_running_kernels runs.

Refs #11232
Refs #11234

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/ai/backend/agent/docker/agent.py Outdated
self._stats_streamer = DockerStatsStreamer(self.docker)
for computer_ctx in self.computers.values():
instance = computer_ctx.instance
if hasattr(instance, "attach_stats_streamer"):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer to avoid dynamic access patterns such as hasattr whenever possible.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — removed the hasattr probe in 7cf076a. Moved attach_stats_streamer onto AbstractComputePlugin with a no-op default so non-Docker plugins (K8s, Dummy, third-party accelerators) inherit it safely, and DockerAgent.__ainit__ now calls it unconditionally on every compute plugin.

rapsealk and others added 2 commits April 24, 2026 11:02
…er on base

Move `attach_stats_streamer` to `AbstractComputePlugin` as a no-op default
so `DockerAgent` can call it unconditionally on every compute plugin
instead of probing with `hasattr`. CPU/Memory plugins keep their
overrides; non-Docker plugins (K8s, Dummy, third-party accelerators)
inherit the no-op safely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ordering test stubbed a non-intrinsic plugin with ``object()`` to
exercise the removed ``hasattr`` skip branch; the attach loop now calls
``attach_stats_streamer`` unconditionally on every compute plugin, so
the stub raises ``AttributeError``. Drop the unrelated-plugin case and
the stale docstring reference to the hasattr branch — the no-op default
on ``AbstractComputePlugin`` makes the skip structurally safe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rapsealk rapsealk changed the title refactor(agent): Share a single DockerStatsStreamer across CPU/Memory plugins refactor(BA-5861): Share a single DockerStatsStreamer across CPU/Memory plugins Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:agent Related to Agent component size:L 100~500 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Share a single DockerStatsStreamer across CPU/Memory intrinsic plugins

2 participants